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Abstract 

We review a study of the Internet traffic properties. We analyze under what conditions the reported results 
could be reproduced. Relations of results of passive measurements and those of modelling are also discussed. An 
example of the first-order phase transitions in the Internet traffic is presented. 



1. Introduction 

For the Internet user the most important pa- 
rameter of the network is the speed at which one 
retrieves documents. Anyone browsing the Inter- 
net for preprints and articles (xxx.lanl.gov, pub- 
lish. aps.org, elsevier.nl, iop.org, wspc.com, etc.), or 
looking for the news and weather, booking tickets, 
making hotel reservations, etc. asks himself: why 
is the Web so slow? This is precisely the question 
which motivates our work. 

The Internet efficiency depends upon the two 
main aspects of the network, topology and trans- 
port. The topology and connectivity features of the 
Internet were discussed by Newmann and Barabasi 
at this Conference [1,2], and our main subject is 
the review of the Internet transport investigation: 
its measurements, properties and modelling. 

The first set of properties is connected with nat- 
ural characteristics of human activity. In this con- 
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nection, the daily working hours lead to a daily pe- 
riodicity of the web traffic, the work week leads to 
a weekly periodicity (weekends! ) , the annual calen- 
dar leads to an annual periodicity (winter and sum- 
mer vacations! ) , and so on. Holidays and important 
events (political campaigns, etc.) can also affect 
the traffic. The latest and most sudden example is 
the congestion of all news servers on September 11 
of 2001 just after the terrorist attack on America. 

The second set of the web traffic properties is 
connected with the fact that the path from one 
point on the web (e.g., user) and another point 
(e.g., server) is not stable. The path is often quite 
complicated and consists of a number of routers, 
channels, caches, etc., which changes with time be- 
cause the network constantly develops [3,4]. This 
reminds us of the ancient philosopher Heraclites, 
who asserted "You cannot step twice into the same 
river" . We could say the same about the Internet 
river. The third set of properties further compli- 
cating the measurements is connected with the dy- 
namics of the Internet traffic for many autonomous 
systems, which leads to the random changes in the 
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transport topology and therefore in the timing and 
loading characteristics. 

The knowledge obtained on the Internet traffic 
properties is reviewed in this article. 



2. Network traffic: 1/ f noise and models. 

Network traffic has been the subject of intensive 
investigations especially in the last eight years after 
WWW technology was invented. Nevertherless, we 
still do not have a single simple answer to the main 
question of why is the Web so slow?. 

One of the main difficulties of understanding 
the Internet nature is that it is neither centrally 
planned nor configured. It is a good example of 
the self-organised global structure which is still a 
self-growing one. People who arc running the In- 
ternet started in 2000 their annual workshop on 
the passive and active measurements (PAM) on the 
Internet. The relation between passive and active 
measurements is still not understood. Active mea- 
surements are easily understood but do not clearly 
predict actual Internet and Web performance; pas- 
sive Internet /Web measurements can reflect actual 
performance but can be hard to interpret. Pas- 
sive measurement is nothing but data acquisition 
to log-files of the transactions routinely done by 
the routers, switches, proxies, caches and worksta- 
tions. One could find this information in the corre- 
sponding log-files. The active measurements could 
be divided into several groups. The two most ex- 
treme are RTT information and modelling of the 
user activity. RTT (round trip time) between two 
nodes on the Internet could be obtained, f.e., us- 
ing the usual Unix command ping. The resulting 
value can be used by many protocols for different 
reasons like, f.e., for a path optimization or more 
often just to check whether the given node is ac- 
cessible or not. 

The distribution of RTT times is not trivial and 
it is the subject of intensive investigation after the 
pioneering paper of Csabai [5] . Analysing the re- 
sults of several hundreds of RTTs obtained in two 
weeks between his workstation in Budapest, Hun- 
gary, and an ftp server in Helsinki, Finland Csabai 
found that the power spectral density could be de- 



scribed by power law l// 115 in a wide range of 
frequencies / from 10~ 4 Hz to 0.5 Hz. 

Actually, the self-similarity in computer traffic 
was found a little bit earlier by Leland et al. [6] for 
the packet flow density in several local Ethernet 
networks at the Bellcore Morristown Research and 
Engineering Center. They found "heavy tails" in 
the cumulative distribution function of the packet 
sizes. At that time none of the commonly used traf- 
fic models was able to capture this fractal-like be- 
havior. 

Takaysu et al. [7] developed a contact process 
(CP) model of jam dynamics on the Internet. They 
associated a particle with the non-jamming gate- 
way which can reproduce another particle at a 
neighboring site at a ratep, and can annihilate (i.e. 
jammed gateway) spontaneously at a rate q, and 
p + q < 1. They found that the survival probability 
at the critical point (5 = 1 — p/q — ► 0) was propor- 
tional to the inverse time \ jt. Moreover, assum- 
ing that the RTT times are the two-valued func- 
tion, taking values h and 0, they showed that the 
power spectrum proportional to l/.f a with value 
a bounded, < a < 1. This result seems to be 
supported by the analysis of Ethernet and Internet 
traffic in a series of papers by Takayusu, et al. [8]. 

Huisinga et al. [9] introduced a microscopic 
model for the packet transport on the Internet. 
Data are divided into small packets of a definite 
size. These data packets move, for fixed source and 
destination hosts, due to the structure of TCP/IP, 
along a temporally fixed route. Therefore, the 
transport between two specific hosts can be viewed 
as a one-dimensional process. The cellular automa- 
ton model Assymetric Simple Exclusion Process 
(ASEP) has an important property - the occurence 
of boundary-induced pase transitions [10]. 

Analysing the power spectrum of the travel 
times Huisinga et al. [9] identified three phases: 
free flow characterized by the white noise, con- 
gested phase with l// 1 / 2 -noise at low frequencies 
and white noise at high frequencies, and critical 
load phase with with 1//- noise at low frequencies 
and white noise at high frequencies. They con- 
cluded with the important observation that the 
jamming properties are not related to the struc- 
ture of the network and rather connected with the 
paths with critical load. 
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3. Server load and latency times 

Although the traffic models discussed in the pre- 
vious section seem to explain some pecularities of 
the Internet traffic at the critical path load, some 
recently observed phenomena connected with the 
server critical load (news servers!) are still not un- 
derstood. 

Barford and Crovella [12,11] find surprising ef- 
fect of server load. When the network is heavily 
loaded (i.e., packet loss rates are high), it is not 
uncommon for a heavily loaded server to show a 
better mean response time than a lightly loaded 
server. Their measurements suggest that this may 
be because heavily loaded servers often show lower 
packet loss rates, and since packet losses have dra- 
matic effects on transfer latency, this can reduce 
the mean response time. 

In the rest of this section I will analyse the prob- 
ability distribution function of the latency times 
and arrive at the simplest experimental setup hav- 
ing the property observed by Barford and Crovella 
[12,11]. 

It is a commonly accepted picture (see, f.e., Fig- 
ure 1 in the paper by Helbing, et al. [13]) that the 
distribution of download times (i.e., latency times) 
is log-normal and that this property leads to the 
l/f property of the Internet traffic. 

In fact, the distribution function of latency times 
is multipcaked [3]. The peaks could be associated 
with two sets of factors. The first ones are con- 
nected with the different throughput of the par- 
ticular paths between user worskstation and desti- 
nation servers. This fact is clearly visible for some 
preprint and reprint library servers placed in the 
Far East. The next set of factors is connected with 
the traffic content. The speed of document retrieval 
depends on the type of document: text, image, bi- 
nary file, audio, video, etc. Indeed, it is practi- 
cally possible to decompose the distribution func- 
tion into the more elementary ones analysing the 
content of the proxy server log files and using the 
above-mentioned two sets of factors. Nevertherless, 
even in this case, the distribution function of ele- 
mentary processes like taking files from a partic- 
ular archive (say, Los Alamos Archive) to a given 
workstation, the distribution more often demon- 



strates two peaks [14]. 

The same effect of multipeaked distributions 
could also be obtained analysing RTT times on a 
short path between workstation and border router. 
Figure 1 shows a histogram of RTT times between 
workstation (connected to 100 Mbps/s Ethernet 
fiber-optic campus network of Chcrnogolovka Sci- 
ence Park (AS 9113)) and border Cisco router 
BNS045 (147.45.20.221) of FREEnet. AS 9113 
and BNS045 connected by a 2 Mbps ATM chan- 
nel. Workstation and border router are separated 
by only one LAN router. The large and narrow 
peak at about 7 ms is associated with the round 
trip time of a 64 bit ping packet in the path con- 
necting three devices with an empty 2 Mbps ATM 
channel. The next and wide peak at about 750 ms 
could be associated with the router congestions. 
The fact that these peaks are well separated is 
due to some particular features of the TCP pro- 
tocol. This picture is a clear demonstration of the 
nonlinear response of the router. 

Moreover, we found [14] that RTT times inside a 
single workstation exhibit usually two peaks in the 
probability distribution function of RTT times. We 
performed an analysis of the RTT times of the Unix 
ping command on the internal network interface 

ping -i S -s 56 127.0.0. 1 

where 5* is the interval in seconds between two 
consecutive ping packets of the size of 56 bytes 
(the ping packet contains also 8 additional bytes, 
i.e. the total packet size is 64 bytes). We vary in- 
terval S and accumulate data for 5 = 1, 10, 20 
and 50 seconds. Figure 2 shows histograms P{tn) 
of RTT times calculated in intervals of 0.004 ms. 
This histograms were obtained from the results of 
50000 pings grouped in 5 groups and then aver- 
aged. Fluctuations from one group to another are 
small enough. 

All histograms have two peaks, placed at about 
4 X) = 1-28 - 1.29 ms and = 1.35 - 1.36 ms. 
For the pings with interval S = 1 s there are 
higher probabilities that measured RTT time will 
be about t^). For the pings with interval S = 50 
s this probability is higher at the value of about 
tft\ For the intermediate interval between pings 
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S = 20 s both peaks have approximately the same 
height, and the result of the measurement of tjj) 

(2) 

or t R ' will be equally probable. 

It is better to plot RTT times ranked ascend- 
ingly as shown on Figure 3. Changing the axes and 
their directions one could find that this curve after 
normalization is nothing but a cumulative distri- 
bution function of RTT times. In fact, the curves 
in Figure 2 could be obtained from Figure 3 with 
proper differentiation. Figures 2-3 clearly demon- 
strate that by varying the interval between pings 
we have some kind of "first-order phase transition" 
between states characterised by RTT time values 
t^) and tft\ It is not a true phase transition be- 
cause we could not associate any order parameter 
with the process. 

The effect we found is stable and reproducible. 
We obtain the same behaviour for the number of 
Unix workstations of different types disconnected 
from the network and not running any processes 
except minimal configurations. The only difference 
is the values of t^) and RTT times and the 
interval S times. So, this effect could be considered 
as the universal one. 

We associate this effect with the cache memory 
organization. It seems that the difference between 
the values of t^} and is the time necessary 
to upload the ping process to the cache memory. 
Unix systems were running some processes which 
could oust ping procedure from the cache and thus 
increase the RTT time. A detailed analysis will be 
published elsewhere [14] as well the the model of 
the process. 

We analysed the power spectrum of the RTT 
signals [5] and found that in all cases it could be 
approximated with the 1//" law with a = 2 for 
the low frequencies and demonstrated white noise 
at the high frequencies. Nevertheless, this analysis 
has to be considered with more caution; we found 
in one measurement that just one fluctuation in 
RTT time which gives tn — 106 ms obtained in 
any experiment with S = 50 s has changed our 
power spectrum drastically. Excluding this enor- 
mous fluctuation all results are very stable and re- 
producible. 



4. Discussion 

We have to note that the effect of first-order 
phase transition which we found to be the influ- 
ence of the cache memory, could be even more uni- 
versal. In fact, all servers usually use cache mem- 
ory, and the effect of the higher productivity of 
the servers [12,11] under the heavy load could be 
explained as an effect of the heavy cache memory 
usage as well. 

Most of the models of Internet traffic are based 
on the assumption that the routers are nothing but 
queues. It seems that this is not always the case 
and more sophisticated nonlinear models of the el- 
ements of the network should be considered. Most 
of the nodes which are just single routers nowdays 
consist of several devices which separately work 
as switches, or routers, or name servers, or proxy- 
cache servers with rather complicated intercom- 
munications and interactions. 
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Fig. 1. Typical histogram of RTT times (in ras) between 
workstation and router BNS045 calculated as number of 
RTT time values within an interval of 20 ms. 
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Fig. 3. Internal ping RTT times ranked ascendingly for the 
Fig. 2. Histograms of internal ping RTT times calculated interval times between two ping packets S = 1 s (solid 

in intervals of 0.004 ms for the interval times between two line), S = 10 s (dashed line), S = 20 s (dotted line), and 

ping packets S = 1 s (solid line), S = 10 s (dashed line), S = 50 s (dash-dotted line). 

S = 20 s (dotted line), and S = 50 s (dash-dotted line). 
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